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Abstract — In this paper, we present an analytical tool for 

] understanding the performance of structured overlay networks 
under churn based on the master-equation approach of physics. 
We motivate and derive an equation for the average number of 

' hops taken by lookups during churn, for the Chord network. 

' We analyse this equation in detail to understand the behaviour 
with and without churn. We then use this understanding to 
predict how lookups will scale for varying peer population as 
well as varying the sizes of the routing tables. We then consider 

' a change in the maintenance algorithm of the overlay, from 
periodic stabilisation to a reactive one which corrects fingers only 
when a change is detected. We generalise our earlier analysis to 
understand how the reactive strategy compares with the periodic 
one. 



I. Introduction 

A crucial part of assessing the performance of a structured 
P2P system (aka DHT) is evaluating how it copes with churn. 
Extensive simulation is currently the prevalent tool for gaining 
such knowledge. Examples include the work of Li et al. [10], 
Rhea et al. [12], and Rowstron et al. [5]. There has also 
been some theoretical analyses done, albeit less frequently. For 
instance, Liben-Nowell et al. [11] prove a lower bound on the 
maintenance rate required for a network to remain connected 
in the face of a given churn rate. Aspnes et al. [4] give upper 
and lower bounds on the number of messages needed to locate 
a node/data item in a DHT in the presence of node or link 
failures. The value of theoretical studies of this nature is that 
they provide insights neutral to the details of any particular 
DHT. 

We have chosen to adopt a slightly different approach to 
theoretical work on DHTs. We concentrate not on establishing 
bounds, but rather on a more precise prediction of the relevant 
quantities in such dynamically evolving systems. Our approach 
is based mainly on the Master-Equation approach used in the 
analysis of physical systems. We have previously introduced 
our approach in in [7], [8] where we presented a detailed anal- 
ysis of the Chord system [13]. In this paper, we show that the 
approach is applicable to other systems as well. We do this by 
comparing the periodic stabilization maintenance technique of 
Chord with the correction-on-change maintenance technique 
of DKS [3]. 

Due to space limitations, we assume reader familiarity with 
Chord and DKS, including such terminology as successors, 
finger starts and finger nodes etc. 

This work is funded by the 6th FP EVERGROW project. 



The rest of the paper is organised as follows. In Section 
Hn we introduce the Master-Equation approach. In Section |lll] 
we mention some related work. In section |IV] we begin by 
briefly reviewing some of our previously published results on 
predicting the performance of the Chord network as a function 
of the failed pointers in the system in the case that the nodes 
use a periodic maintenance scheme. We then show some new 
results on how this complicated equation can be simplified to 
get quick predictions for varying number of peers and varying 
number of links per node. We relegate some of the details of 
this analysis to Appendix IVllI In section |V] we explain how 
to use the Master-Equation approach to analyse the reactive 
maintenance strategy of interest and present our results on how 
this strategy compares with the periodic case analysed earlier 
We summarise our results in Section |VT] 

II. The Master-Equation Approach for 
Structured Overlays 

In a complicated system like a P2P network, in which there 
are many participants, and in which there are many inter- 
leaved processes happening in time, predicting the state of 
the network (or of any quantity of interest) can at best be 
done by specifying the probability distribution function (PDF) 
of the quantity in the steady state (when the system, though 
changing continually in time, is stationary on average). For 
example, one quantity of interest for us when analysing such 
a network, is the fraction of failed links between nodes, in the 
steady state. This quantity does not take some deterministic 
value in the steady state. Instead it is specified by a PDF, 
which can then be used to determine the average value. The 
problem is thus to calculate the PDF (and then to understand 
how it affects the performance of the network, as explained 
below). 

In general this is not an easy task, since the probability 
is affected by a number of inter-leaved processes in any 
time-varying system. In [7], [8], we demonstrated how we 
could analyse a P2P network like Chord [13], using a Master- 
Equation based approach. This approach is generally used in 
physics to understand a system evolving in time, by means of 
equations specifying the time-evolution of the probabilities of 
finding the system in a specific state. These equations require 
as an input, the rates of various processes affecting the state 
of the system. For example, in a peer-to-peer network, these 
processes could be the join and failure rates of the member 
nodes, the rate at which each node performs maintenance as 
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well as the rate at which lookups are done in the network 
(the latter rate is relevant only if the lookups affect the state 
of the network in some way). Given these rates, the equation 
for the time-evolution of the probability of the quantity of 
interest can be written by keeping track of how these rates 
affect this quantity (such as the number of failed pointers in 
the system) in an infinitesimal interval of time, when only a 
limited number of processes (typically one) can be expected 
to occur simultaneously. 

With this approach, we were able to quantify very accurately 
the probabilities of any connection in the network (either 
fingers or successors) having failed. We then demonstrated 
how we could use this information to predict the performance 
of the network — the number of hops including time outs which 
a lookup takes on average — as a function of the rates (of 
join, failure and stabilization) of all the processes happening 
in the network, as well as of all the parameters specifying the 
network (such as how many pointers a node has on average). 
The analysis was done for a specific maintenance strategy, 
called periodic maintenance (or eager maintenance) 

In this paper, we generalise our approach so as to be able 
to compare networks using different maintenance strategies. 
In particular, we compare our earlier results for periodic 
maintenance with a reactive maintenance strategy proposed 
in [6]. Combining this with some of our previous results, we 
are also, as a by product, able to compare the performance 
of networks specified by different numbers of peers, different 
number of pointers per node and/or different maintenance 
strategies. As we show below, which system is better depends 
both on the value of the parameters as well as the level of 
churn. The approach we propose is thus a useful tool for 
the quantitative and fair comparison of networks specified by 
different parameters and using different algorithms. 

III. Related Work 

In [2], an analysis, very similar in spirit to the one done 
in this paper, is carried out in the context of P-Grid [1]. 
An equation is written for system performance in the state 
of dynamic equilibrium for various maintenance strategies. 
However for each maintenance strategy, the analysis has to 
be entirely redone. In contrast, a master equation description 
provides a foundation for the theoretical analysis of overlays, 
which does not have to be entirely rebuilt each time any given 
algorithm is changed. As we show in this paper, we can carry 
over a lot of our earlier analysis, when the maintenance scheme 
is changed from a periodic to a reactive one. In addition, the 
master equation description can be made arbitrarily precise to 
include non-linear effects as well. And as we show, non linear 
effects are important when churn is high. 

IV. The Lookup Equation for Chord 

We quantify the performance of the network, by the number 
of hops required on average from the originator of the query 
to the node with the answer. This is just the total number of 
nodes contacted per query (or equivalently, the total number 
of pointers used per query) including the total number of 
failed pointers used en route. This latter quantity (which arises 



because of the churn in the network) is the reason that the 
hop count per query increases with high dynamism and is 
hence an important quantity to understand. In the case of the 
periodic maintenance scheme, this quantity is a function of 
(1 — f3)r where r is the ratio of the stabilisation rate to the 
join (or failure) rate and 1 — /3 is the fraction of times a 
node stabilises its finger, when performing maintenance, as 
mentioned in Section H] We demonstrate how this quantity 
can be calculated in Section [V] in the context of the reactive 
maintenance policy, which is a simple generalisation of how 
it is calculated earlier in [7], [8], for the periodic maintenance 
scheme. In this section, we briefly review our earlier results 
on how the performance of the network (as exemplified by 
the average hopcount per query), can be determined once the 
fraction of failed pointers is known. 

The key to predicting the performance of the network is to 
write a recursive equation for the expected cost Ct{r, (3) (also 
denoted Ct) for a given node to reach some target, t keys 
away from it. (For example, Ci is the cost of looking up the 
adjacent key which is 1 key away). 

The Lookup Equation for the expected cost of reaching a 
general distance t is then derived by following closely the 
Chord protocol which is a greedy strategy designed to reduce 
the distance to the query at every step without overshooting the 
target . A lookup for t thus proceeds by first finding the closest 
preceding finger. The node that this finger points to is then 
asked to continue the query, if it is alive. If this node is dead, 
the originator of the query uses the next closest preceding 
finger and the query proceeds in this manner 

For the purposes of the analysis, it is easier to think in terms 
of the closest preceding start. Let us hence define ^ to be the 
itart of the finger (say the fc*'') that most closely precedes t. 
Hence ^ — 2^^^ + n and i = ^ + m, i.e. there are m keys 
between the sought target t and the start of the most closely 
preceding finger. With that, we can write a recursion relation 
for Q+m as follows: 

Q+m = Q [1 - a{m)\ 



+ (1 - h)a{m) 



1 + ^ bc[i,m)Cra-i 



i=0 



C/2'-l 



fe-i 



(1) 



1=1 



&cG,e/2^)(l + + + 0{hk{k)) 



where = X]m=i i ^/S™ and hk{i) is the probability that 

finger owing to the death 



a node is forced to use its k — i*'* 
of its fc*'' finger 

The probabilities a, be can be derived from the internode 
interval distribution [7], [8] which is just the distribution of 
distances between adjacent nodes. Given a ring of K. keys 
and N nodes (on average), where nodes can join and leave 
independently, the probability that two adjacent nodes are a 



distance x apart on the ring is simply P{x) 



where p 



Using this distribution, its easy to estimate 
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the probability that there is definitely atleast one node in an 
interval of length x. This is: a{x) = 1 — p^. The probability 
that the first node encountered from any key is at a distance 
i from that key is then hi = p''{\ — p). Hence the conditional 
probability that the first node from a given key is at a distance i 
given that there is atleast one node in the interval is 6c(i, x) = 
h(i)la{x). 

The probability hk{i) is easy to compute given the proba- 
bility a as well as the probabilities /^'s of the A;*'' finger being 
dead. 

hu{T) =a(C/2'0(l-/fe-«) 

xn,=i.,_i(l - a{^l2') + a(^/2^)/fc_,),i < k (2) 
hk{k) =n,=i,fe_i(l - a(e/2^) + a{^l2')fk-s) 

Eqn|2] accounts for all the reasons that a node may have to 
use its k—i'-^ finger instead of its k^^ finger. This could happen 
because the intervening fingers were either dead or not distinct 
(fingers k and fc — 1 are not distinct if they have the same entry 
in the finger table. Though the starts of the two fingers are 
different, if there is no node in the interval between the starts, 
the entry in the finger table will be the same). The probabilities 
/ifc(i) satisfy the constraint X]i=i^fe(*) ^ ^- ^k{k), is the 
probability that a node cannot use any earlier entry in its finger 
table,in which case it has to fall back on its successor list 
instead. We indicate this case by the last term in Eq. [T] which 
is 0{hk{k)). In practise, the probability for this is extremely 
small except for targets very close to n. Hence this does not 
significantly affect the value of general lookups and we ignore 
it for the moment. 

The cost for general lookups is 



L(r,/3) 



The lookup equation is solved recursively numerically, using 
the expressions for a, he, hk{i) and Ci. In Fig. [T] we have 
plotted the theoretical prediction of Equation [T] versus what 
we get from simulating Chord. Here we have used N ^ 1000 
and K = 2^°. As can be seen the the theoretical results match 
the simulation results very well. 

In Fig. 12] we also show the theoretical predictions for some 
larger values of N . 

On general grounds, it is easy to argue from the structure 
of Equation [T] that the dependence of the average lookup 
on churn comes entirely from the presence of the terms fk- 
Since fk ^ f independent of k for large fingers, we can 
approximate the average lookup length by the functional form 
L{r, I3)=A + Bf + Cp + ---. The coefficients A, B, C etc 
can be recursively computed by solving the lookup equation to 
the required order in /. They depend only on N the number of 
nodes, 1 — p the density of peers and h the base or equivalently 
the size of the finger table of each node. The advantage of 
writing the lookup length this way is that churn-specific details 
such as how new joinees construct a finger table or how 
exactly stabilizations are done in the system, can be isolated 
in the expression for /. If we were to change our stabilization 
strategy, as we will demonstrate below, we could immediately 
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estimate the lookup length by plugging in the new expression 
for / in the above relation. 

Another advantage of having a simple expression such as 
the above, is that if we can estimate A,B,C ■ ■ ■ accurately, 
we can make use of the expression for L to estimate the churn 
(or the value of r) in the system, hence using a local measure 
to estimate a global quantity. The logic in doing so is the 
inverse of the reasoning we have used so far So far, we have 
used the churn as the input for finding fk and hence L. But 
we can also reverse the logic and try and estimate churn, if 
we know the value of the average lookup length L. \f L has 
the above simple expression, then given A and B to 0{f), 
we have / = From the expression for / (see section FVl 

for how to evaluate /), we can now get the value of r. Hence 
any peer can make an estimate of the churn that the system is 
facing if it knows how long its lookups are taking on average, 
and if it has an estimate of N . 

To get A, we need to consider Eqn[T]with no churn (all /^'s 
set to zero). In Appendix lVIII we study the lookup equation [T| 
in some detail to understand the behaviour without churn and 
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obtain the value of A for any base b. This is useful on several 
counts. First, the value of A is needed to predict the lookup 
costs as explained above. Secondly, if b changes ( a system 
of base 6 has a finger table of size = (6 — l)logi,{IC)), 
all else remaining the same, the only major change in the 
lookup cost is due to the change in A. So estimating A 
precisely has the benefit that we can predict the lookup cost 
for any base b. Thirdly, the analysis confirms that Equation 
[T] does indeed reproduce well known results for the lookup 
hop count in Chord, such as for example, that the average 
lookup cost is 0.5 * log(iV) without churn [13]. Infact as 
demonstrated in Appendix I VIII for any N, the average lookup 
cost as predicted by Eq. [T]is indeed 0.5 * log(iV) plus some 
p-dependent corrections which though small are accurately 
predicted. 

A simple estimate for B and C can be made in the following 
manner. Let every finger be dead with some finite probability 
/. Each lookup encounters on average A fingers, where A is 
the average lookup length without churn. Each of these fingers 
could be alive (in which case it contributes a cost of 1), dead 
with a probability / in which case it contributes a cost of 2 if 
the next finger chosen is alive (with probability 1 — /) and so 
on. Its trivial to verify that this estimates the look-up cost to 
be ^(1 + / + + • ■ • )■ Comparing with our expression for 
L, this gives an estimate of B — A,C — A, ■ ■ ■ . 

In general if L = A+B*g{f), then if we scale L by plotting 
[L — A)/B for varying N, we should get an estimate of g{f). 
Note that / depends on p and M. the number of fingers. In 
addition if g{J) = aif + 02/^ + • • • , the coefficients 01,02, 
etc can also depend on p. However for 1 — p << 1, these 
dependences on p are small and the curves for different N 
collapse onto the same curve on scaling. In Fig. |3]we have 
scaled the curves ploted in Fig. |2]in the above manner, using 
B = A. The values of A used are derived from the analysis 
of the previous section. As can be seen the curves collapse 
onto one curve which is well approximated by the function 
9{f) = / + 3 * giving ai = 1 and 02 = 3. The fits in 
Fig |2] are also according to this functional form. It should be 
emphasized however that this approximation for g{f) is good 
only for 1 — p << 1. For higher values of peer density, the 
curves for different N will not collapse onto one curve and 
any p-dependence of the coefficients a/s will show up as well. 

We can use the above functional form to predict how 
lookups would behave if we change the base b (the size of the 
routing table) of the system. In Fig |4] we plot the functional 
form + /(&) + 3/(6)2) 6 = 2,4, 16. The coefficient 

A{b) is accurately predicted by Eq. [TTT in Appendix I Vill i, with 
the definition of + 1) taken appropriately. f{b) is affected 
by the base b because the number of fingers increases with b. 

As can be seen, when churn is low, a large b is an advantage 
and significantly improves the lookup length. However when 
churn is high, the flip side of having a larger routing table is 
that it needs more maintenance. Hence beyond some value of 
churn, the larger the value of b, the larger the lookup latency. 

This is similar to the spirit of the numerical investigations 
done in [9]. However when comparing different bases for 
Chord, Li et al [9] find that while base 2 is the best for 
high churn (as we find here), base 8 is the best for low churn. 





N=IOOO 




N=2000 




IN— 4UUU 




N=8000 


:\ 


N= 16000 


. \ 

\ 

■ \ 

\ 


f(x)+3*f(x)- 







< 0.6 

5 0.5 

GO. 

6 0.4 
0.3 
0.2 
0.1 



100 200 300 400 500 600 700 800 900 1000 
(l-P)r 
Fig. 3 

Scaled Lookup cost, for N = 1000, 2000, 4000, 8000, 160000 peers. 





16 r 










tN 
11 


14 - 


Xi 




(D 




■A 


12 - 












10 - 


+ 






8 - 


< 
II 




1-P)r) 


6 - 








4 - 



A(b)=5.846,b=2 
A(b)= 4.8832, b=4 
A(b)= 3.6855, b=16 



100 200 300 400 500 600 700 800 900 1000 
Fig. 4 

Lookup cost, for = 1000 peers for base 6 = 2, 4, 16. 



Increasing the base beyond this does not seem to improve the 
cost. The discrepancy between this finding and ours is due to 
the details of the periodic maintenance scheme which we use. 
In our case, we have taken the simplest scenario in which each 
node needs to stabilise fingers and the order in which this 
is done is random. In practice only ~ log N of the Ai fingers 
are distinct, so only ~ logA^ stabilisations need be done by 
each node. In addition, in [9], finger stabilisations are done 
only if the finger is pinged and found to be dead. 

V. 'Correction-on-Change' Maintenance 
Strategy 

In this section, we analyse a different maintenance strategy 
using the master-equation formalism. The strategy we have 
analysed so far is periodic stabilisation of successors as well as 
fingers. We now consider a strategy where a node periodically 
stabilises its successors but does not do so for its fingers. 
Instead, for maintaining its fingers, it relies on other nodes for 
updates [6]. Whenever a node n detects that its first successor 
n.si is wrong (failed or incorrect), it sends out messages to 
all the nodes that are pointing to its wrong first successor, so 
that they can update their affected finger. The node sending 
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messages can either do so by broadcasting these messages to 
all affected nodes simultaneously, or by scheduling messages 
periodically at some rate. We analyse the latter option in this 
paper, since it provides a more intuitive and broader framework 
for the comparison of the two schemes 

For a system with id-size K., there are of the order of A4 = 
log2 /C fingers pointing to any node (there can be more than 
this if node spacings are smaller than average. However, as we 
argue below, for our purpose this is not important). Of course, 
not all A4 of these fingers are distinct. Several of these fingers 
belong to node n itself. However to keep the analysis simple 
(and in keeping with the spirit of our analysis of the periodic 
stabilisation scheme), we assume that every node that detects a 
wrong successor needs to send out exactly Ai messages (even 
if some of these 'messages' are sent to itself). 

To find out where the nodes that point to n.si are located, 
n needs to do a lookup. For example, to find the node with 
the k*'^ finger pointing to n.si, n can do a lookup for the id 
n — 2*^^^. On obtaining the first successor (lets call it node 
p) of this id, it would immediately know if the A:*'* finger of 
p indeed needs to be updated. We think of each lookup as 
a 'correction message'. If there is more than one node that 
needs its fc*'' successor updated (because for example, the 
successors of p also happen to point to n.si), n could leave the 
responsibility of informing these other nodes to p. We could 
take into account the probability that a correction action leads 
to more than A4 messages. But for the moment we ignore 
this point (We could argue that once it is p's responsibilities 
to check that its successors know about n.si, it could piggy- 
back this information when it does a successor stabilisation, 
which does not affect the number of messages sent). 

Whenever a node receives a message updating its informa- 
tion about a finger, it immediately corrects the appropriate 
entry in its routing table. 

In the following, we demonstrate how we can analyse such a 
strategy. We would like to ultimately compare its performance 
to periodic stabilisation in the face of churn. To make such a 
comparisn meaningful, we need to quantify the concept of 
'maintenance-effort' per node, and compare the two schemes 
at a given level of churn and at the same value of the 
maintenance effort per node. We elaborate on this a little later 
in Section IV-BI 

Another point to note is how to quantify system perfor- 
mance. We have previously done it in terms of lookup hops. 
But a more correct way might be to ask for the latency for 
consistent lookups (since some of the lookups could be incon- 
sistent). However we have checked that , within our analytical 
framework, this does not change the results qualiltatively. 

A. Analysis of the Correction- on- change strategy 

To generalise the analysis to meet the situation when some 
nodes are sending messages while others are not, we say that a 
node can be in state 5*1 or 52. In state 5*1, a node can stabilise 
its first successor at rate aXg, fail at rate A/ and assist in 
joins at rate Xj as before. In state 5*2, a node can stabilise 
its first successor at rate aAs, fail at rate A/, assist in joins 
at rate Aj and in addition, send correction messages (which is 



= Ns,{t)-l 
= Ns,{t) + l 
= AfSi(i) + l 
= Ns,-l 
= Ns, it) 



Probability of Occurence 
ci.i = {XfNs.At) 
ci.2 = (XjNAt) 

Cl.3 

Cl.4 
1 - 



(ci.l + Ci.2 + Ci.3 + C1.4) 



TABLE I 

Gain and loss terms for Ns-^ the number of nodes in state Si. 



essentially equivalent to doing one lookup ) at rate A^/ = cA^. 
As we show in Section IV-BI if we want to compare the two 
maintenance strategies in a fair manner then the most general 
values that these parameters can take is a = 1 and a + c = 1. 

Let be the number of nodes in state 5*1 and the 
number of nodes in state 5*2. Clearly Ns-^ + Ng^ = N, the 
total number of nodes in the system. 

We can further partition ^2 into 3^,3^, Sf, • • • , 5^. is 
the state of the node which has yet to send its first correction 
message, 5*1 the state of the node which has sent its first 
correction message but is yet to send its second, etc. 

Consider the gain and loss terms for Ng^- These are 
summarised in table J] 

Term ci 1 is the probability that an Si node is lost because 
it failed. Term ci.2 is the probability that a join occurs thus 
adding to the number of 5*1 nodes in the system (since a new 
joinee is always an 5i-type node). Term ci 3 is the probability 
that an 3^ node sent its last message at rate \m and converted 
into an 5*1 node. The last term ci.4 is the probability that an 
5i-type node did a stabilisation at rate aXg, found a wrong 
first successor with probability wi and hence converted into 
an 5*2 node, wi is the fraction of wrong successor pointers of 
an 5*1 -type node. 

Defining Xs/Xf = r and Xm/Xj = cr the steady state 
equation predicted by table |l] is: 



Psi (1 + arwi) = 1 + crPg 



(3) 



where Pg^ = Ng^ /N. 

We can write a similar equation Ng^ which however does 
not give us any new information since Ng-^ + Ng^ = N. 

Writing a gain-loss equation for each of the Ngi 's in turn, 
we obtain, 



Pg-^{arwi — arw'i) 
1 + cr + arw\ 



and 



cr 



1 



(4) 



(5) 



, for 2 < z < X. 

Here wi is the fraction of Si nodes with wrong pointers 
and w[ is the fraction of 5*2 nodes with wrong pointers. We 
have made a simplification here in assuming that the fraction 
of wrong pointers of 5*2 nodes is the same, irrespective of the 
state of the 5*2 node. In practice (especially if a = 0), this 
will not be the case. However for the parameter ranges we are 
interested in (r >> 1), this is not crucial. 
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TABLE II 

Gain and loss terms for Wt'- the total number of wrong first 
successor pointers in the system. 



TABLE IV 

The relevant gain and loss terms for F^, the number of nodes 

WHOSE kth FINGERS ARE POINTING TO A FAILED NODE FOR fc > 1. 



Change in Wt 
Writ + At) = Writ) + I 
Writ + At) = Wi{t) + 1 
Wr it + At) = Wi (t) - 1 
Wi{t + At) = Wi{t) - 1 
Wi(f + At) = Wilt) 



Probability of Occurrence 
C2.1 = iXjNAt){l -w) 
C2.2 = iXfNAt)(l-w)^ 
C2.3 = {XfNAt) 

C2.4 = {aXsAt)Ns-,wi + {aXsAt)Ns2w[ 

1 - (C2.1 + C2.2 + C2.3 + C2.4) 



TABLE III 

Gain and loss terms for W[ : the number of wrong first 
successor pointers of 52-type nodes. 



Change in Wi 






Probability of Occurrence 


Wiit + At) = 
Wiit + At) = 


Wiit) + 
Wiit) + 
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C2.1 = iXjNs.,At)il-w[). 


1 


C2.2=XfNs2il-w[)^Ps2 

+il - wi)il - w[)Ps,)At 


Wiit + At) = 


Wiit)- 


1 


C2.3 = XfNs.,iw{'^Ps2 + wiwiPs-,)At 


Wiit + At) = 


Wiit)- 


1 


C2.4 = aXsNs2^'i^t 
C2.5 = XMN^w{At 


Wiit + At) = 


Wiit)- 


1 


Wiit + At) = 


Wiit) 




1 - {C2.1 + C2.2 + C2.3 + C2.4 + C2.5) 



Clearly X^i^ -^si, = ■ ^ quantity of interest in our 
analysis is 



Ps^/Ps2 = i- 



1 



(6) 



where gi 



{l+cr+arw'^ ) ' 

To solve for Ps^ etc, we need to solve for wi and w'l. 

However, consider first the equation for Wt - the total 
number of wrong successor pointers in the system (irrespective 
of whether the pointer belongs to an Si or an 5*2 type node. 
The gain and loss terms for Wt are shown in table HI] 
w = Wt/N is the fraction of wrong succesor pointers in 
the system. 

This gives the following equation 



(3 + ar)wiPs, + (3 + ar)w[Ps2 = 2 



(7) 



The gain and loss terms W{. - the number of 5*2 nodes with 
wrong successor pointers - are written in much the same way 
except for a few small changes. Table details the changes 
that occur in W{. in time At. 

The terms here are much the same as derived earlier except 
that we now have to keep track of whether the node that is 
failing (in terms C2.2 and C2.3) is a Si or an S'2-type node. 
In addition term C2.5 is the probability that an 5^-type node 
has a wrong successor pointer, but sends a message and hence 
turns into an Si node with a wrong pointer 

Table Hni gives us the following equation for w'l in the steady 
state 



3 + ar + cr- 



P. 



PS2 



+ [wi 



(8) 



We can write a similar equation for wi which however 
does not contain any new information since wi and w'l satisfy 
equation |7] 

So in effect we have three equations, Eqn. |3] Eq. |7] and 
|8] for three unknowns Ps^ , wi and w'l . In practice this set 




Probability of Occurence 

C3.1 = (AjAfAi)ELiPj°m 

C3.2 = ^j^iXMNs^il - ■w[)Aiwi,w[)At) 

C3.3 = (l-/fe)2[l-pi(fc)](Aj.7VAi) 

C3.4 = (1 - fk)Hpiik)-P2ik))iXfNAt) 

C3.5 = (1 - /fe)2(p2(fc) - PiimXfNAt) 

1 - (C3.1 + C3.2 + C3.3 + C3.4 + C3.5) 



of equations is very hard to solve exactly because of the 
appearance of terms such as g-^ in Eq. |6] 

In the following we will solve the set of equation to 0(l/r) 
by expanding Eq. |6] to first order in w[ . In this case. 



P /p 1 fM-l\l 



(9) 



We can now solve the set of three coupled equations to 
get a quartic equation for w'l as a function of a, a, A4 and 
r. Only one of the roots of the quartic equation is a true 
solution satisfying all the conditions above. The details of the 
calculations though straight forward are tedious and not shown 
here. 

To calculate the cost of lookups, we still need to calculate 
the probability that a finger is dead. The loss and gain terms 
for this calculation are almost exactly the same as carried out 
earlier, in [7], [8] (except for term C3.2) and are shown in table 

m 

The term C3.2 is the probability that a message is sent 
(Aa^A^Sj) times the probability that a fc*'' pointer gets this 
message (with probability fk / fk since only nodes with 
wrong pointers get the messages), times the probability that 
the message is not outdated {l~w[), times the probability that 
the predecessor of the node which has to receive the message 
has a correct successor pointer This last quantity is denoted 
by A{wi,w'i) = 1 — {wiPsi + '^'iPs^)' since the predecessor 
could have been an Si or an 5*2 type node. 

An estimate for '^2 fk is simply ^ A4NS2/N. Substituting 
this in term C3.2, this term becomes = XMNAt{fk/A4){l — 
w[)A{wi,w[) 

Solving for fk in the steady state, and substituting for w'l, 
we get fk as a function of the parameters. As mentioned 
earlier a quick and precise estimate of the lookup length is 
then obtained by taking L = A{1 + f + 3/^). 

B. Comparison of Correction-on-change and Periodic Stabil- 
isation 

In order to compare how the two strategies perform under 
churn, we need to make sure that we are comparing lookup 
latencies for the same number of total maintenance messages 
sent. 

Let us assume that the maximum rate for sending messages 
per node is C. In the case of periodic stabilisation, this implies 
that the rate of doing successor stabilisations A^^ and finger 
stabilisations Aj,^ must in total not exceeed C. This implies 
that \si/C + Asa/C* < 1. If we assume that all nodes always 
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periodic stabilisation 
reactive stabilisation (a=0,c=l) 




200 400 600 800 1000 1200 1400 1600 



100 



a=0,c=l 
a=0.1,c=0.9 
a=0.2,c=0.8 
a=0.3,c=0.7 
a=0.4,c=0.6 
a=0.5,c=0.5 
a=0.6,c=0.4 
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Fig. 5 

Comparison of the Lookup cost for the two maintenance 

STRATEGIES, FOR Af = 1000. 



Fig. 6 

Comparison of the Lookup cost for different values of the 
parameter a, as explained in the text. 



send messages up to their maximum capacity, then clearly 
Xsi/C + Xs-i/C = 1. Suppose we define r = C /\j and ri = 
/ \j ,r2 = A,52 / \j . Then for a given value of r, ri+r2 = r. 
Hence if finger stabilisations are done at rate (1 — /3)r, the 
successor stabilisations need to be done at rate jSr, where the 
parameter /3 can be varied from to 1. 

In the case of correction-on-change, we need to impose the 
same maximum rate C no matter which state the nodes are 
in. In this case, let Xs-^ be the rate of successor stabilisation 
in state 5*1, Ag^ the rate of successor stabilisation in state S2 
and A53 be the rate of sending messages in state 5*2. Clearly 
Xsi = C and A52 + As, = C. Defining r as before, we get 
Xsi/Xj — r and Ag^/Aj +Xs^/Xj = r. Hence comparing with 
our parameters a — 1 and a + c ~ 1. 

In Fig. ID we have plotted the function L ^ A{l + f + 'ip) 
with the value of the lookup length without churn A ~ 5.846 
for N 1000 nodes, for a = (and c = 1) and for [3 = 0.4. 
/ is calculated separately for the two maintenance techniques. 

As can be seen, correction-on-change is better than periodic 
stabilisation when churn is low but not when churn is high. On 
comparing lookup lengths for several different a, it becomes 
evident (see yFig. |6]l that a ~ 0.2 is the optimum value for 
the correction-on-change strategy. 

So interestingly, for nodes in state ^2, it is not the best 
strategy to increase c as much as possible. Its a better strategy 
to spend some of the bandwidth on maintaining a correct 
successor 

VI. Summary 

In summary, we have demonstrated the usefulness of the 
master-equation approach for understanding churn in overlay 
networks. Our analysis can take into account most details of 
the algorithms used by these networks, to provide predictions 
for how the performance depends on the parameters. There are 
several directions in which we can extend the present analysis. 
One of the more important ones is to model congestion on the 
links. This could affect the performance of the two compared 
maintenance strategies differently. The periodic case may not 



be as affected as much as the reactive case, which could suffer 
from congestion collapse. 

Acknowledgments We would hke to thank Ali Ghodsi for 
several very useful discussions. 

References 

[1] Karl Aberer, P-Grid: A self-organizing access structure for p2p infor- 
mation systems, InProceedings of the Sixth International Conference on 
Cooperative Information Systems (CoopIS 2001) (Trento, Italy), 2001. 

[2] Karl Aberer, Anwitaman Datta, and Manfred Hauswirth, Efficient, self- 
contained handling of identity in peer-to-peer systems, IEEE Transac- 
tions on Knowledge and Data Engineering 16 (2004), no. 7, 858-869. 

[3] Luc Onana Alima, Sameh El-Ansary, Per Brand, and Seif Haiidi, 
DKS(N; k; f): A Family of Low Communication, Scalable and Fault- 
Tolerant Infrastructures for P2P Applications, The 3rd InteiTiational 
Workshop On Global and Peer-To-Peer Computing on Large Scale 
Distributed Systems (CCGRID 2003) (Tokyo, Japan), May 2003. 

[4] James Aspnes, Zoe Diamadi, and Gauii Shah, Fault-tolerant routing in 
peer-to-peer systems. Proceedings of the twenty-first annual symposium 
on Principles of distributed computing, ACM Press, 2002, pp. 223-232. 

[5] Miguel Castro, Manuel Costa, and Antony Rowstron, Performance and 
dependability of structured peer-to-peer overlays. Proceedings of the 
2004 International Conference on Dependable Systems and Networks 
(DSN'04), IEEE Computer Society, 2004. 

[6] Ali Ghodsi, Luc Onana Alima, and Seif Haridi, Low- bandwdith 
topology maintenance for robustness in structured overlay networks, 
38th International HICSS Conference, Springer- Verlag, 2005. 

[7] Supriya Krishnamurthy, Sameh El-Ansary, Erik Aurell, and Seif Haridi, 
A statistical theory of chord under chum. The 4th International Work- 
shop on Peer-to-Peer Systems (IPTPS'05) (Ithaca, New York), February 
2005. 

[8] , An analytical study of a strutured overlay in the presence of 

dynamic embership, IEEE Joint Transaction on Networking (2007). 

[9] Jinyang Li, Jeremy Stribling, Thomer M. Gil, Robert Morris, and 
Frans Kaashoek, Comparing the performance of distributed hash tables 
under churn. The 3rd International Workshop on Peer-to-Peer Systems 
(IPTPS'02) (San Diego, CA), Feb 2004. 
[10] Jinyang Li, Jeremy Stribling, Robert Morris, M. Frans Kaashoek, and 
Thomer M. Gil, A performance vs. cost framework for evaluating dht 
design tradeoff's imder chum. Proceedings of the 24th Infocom (Miami, 
EL), March 2005. 

[11] David Liben-Nowell, Hari Balakrishnan, and David Karger, Analysis 
of the evolution of peer-to-peer systems, ACM Conf. on Principles of 
Distributed Computing (PODC) (Monterey, CA), July 2002. 

[12] Sean Rhea, Dennis Geels, Timothy Roscoe, and John Kubiatowicz, 
Handling churn in a DHT, Proceedings of the 2004 USENIX Annual 
Technical Conference(USENIX '04) (Boston, Massachusetts, USA), 
June 2004. 



8 



L (without churn) - Simulation 
L (wihtout churn] - Theory - 
0.^ ' log2(N) 



1 - p=N/K 

Fig. 7 

Theory and Simulation for the lookup cost without churn for 
a key space of size k, = 2^* for varying a^. plotted as reference 
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VII. Appendix 

Equation [T] with the churn-dependent terms set to zero 
becomes: 



m— 1 

Q+,„ = [1 - a{m)] + a{m) + 

K^Cm-^ (10) 



After some rewriting of this, it is easily seen that the cost 
for any key i + 1 can be written as the following recursion 
relation: 



+ (1 - p) + (1 - p)C7,+i_^(,+i) (11) 



Here we have used the definition of a and h from the 
internode-interval distribution and the notation ^(i + 1) refers 
to the start of the finger most closely preceding i + 1. For 
instance, for z + 1 = 4, + 1) = 2 and for i + 1 = 11, 
^{i + 1) = 8 etc. 

We are interested in solving the recursion relation and 
computing L = J2f=i C'i- To do this, we decompose this 
sum into the following partial sums: 



so = = 1 

52 = C3 + C4 

53 = C's + Ce + C7 + Cg 



(12) 



Substituting the expressions for the C's in the above, we find: 

So = 1 



S2 



1-p 



[C2 -C4 + 2+ [so + si] 



(13) 



1 — p 



3=0 



By substituting serially the expressions for Sj (where < j < 
i — 1), the expression for Sj (for i > 2) becomes: 



i-2 



(14) 



2' + (i-l)2 



i-2 



Hence 

M 



J2 s» = -P + [2-^+' - 1] + M2^-' - [2^ - 1] 



i=0 



1-P 



(2-^-1 - l)Ci - ^ - Q 



(15) 



- (2^-^ - 1)C2 - (2"^^' - 1)C4 - 



Therefore 

M 



^s, ^ -p + 2^ +M2 



M-l 



i=0 



M-l 



SM = C2A1-1 + 1 



Ck- 



l-p 

M-2 

- (2^-J- - l)C2.-i 

J=2 



(2^-1 - l)Ci - ^ C2. - Ck-i (16) 



1=2 
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The equation for the average lookup length without churn is 
thus. 



L : 



_£_ 



Therefore, 

L = 1 



-M 



1 + P + p'^ 



+ ■ 



(23) 



1 + -M 



-)M-1 



- 1 



JC 



-Ci 



1 1 

i=2 



Consider the expression inside the brackets. We are computing 
^ — ^ 0, i.e. p = 1 — e, therefore 
If X > i, 



this in the approximation = e 



= (1 - ef 



(17) X > ^, then 



AT' 

become: 



then 0, therefore if 

0. Hence, the terms inside the brackets 



If we can take the limit K. 
of the terms. 

lim L = 1 + ^A^ 

/C— *oo 2 



r 

E 



2J 



2J 



(2^ 



X-3 

I E ■ 

j=T+l 



(24) 



oo, we can throw away some 



1 for a; < ^ 



Where T = hi2 /C — ln2 iV and we have put p'^ 
and p for x > ^. This is clearly an overestimation and 
so we expect the result to over estimate the exact expression 



ED 



1-p 

A4-2 



Ci 

2 



i=l 



9l 



Expression |24] becomes: 



1 



/c 



Therefore: 



^2' 



^2' 



3-T 







C2 C4 






1 -p 


2 


4 8 ■' 


2M-2 


w 1 



L = 1 + -1112 /C - - [1112 - ln2 N] 



(25) 



+ - ln2 N 



Since Ci = 1, we can write 

L = 
C2M-3 — 1 



C2 



C4 



+ - 



(18) 



(19) 



2M-?, 

From the recursion relation for the Ci's, it is easy to see that 
(C, - 1) = (1 - p)9f\p) + (1 - V) + • • ■ (20) 



where the 5i's are functions only of p. 

■N 
■ K 



Hence if (1 — p) is small 0), we need only compute 



Which is the known result for the average lookup length of 
Chord. 

Another important parameter in the performance of DHTs 
in general is the base. By increasing the base, the number of 
fingers per node increases which leads to a shorter lookup path 
length. The effect of varying the base has been studied in [3], 
[10]. So far, we have considered in this analysis base-2 Chord. 
We can likewise carry out this analysis for any base. 

In general, we have base-6 with (6 — l)logb{IC) fingers per 
node. Consider as an example b = A. Here we can define the 
the partial sums again in the following manner: 



the Ci's to first order in (1 — p) to get the leading order effect 
and second order in (1 — p) to get the correction etc. 
Hence in general the, the expression for L is: 

1 



Ao = So = Ci = 1 

Ai = Si + S2 + S3 

A2 = S4 + ,S5 + se 



(26) 



L = 1 



2 2 



ei(p) + (l-p)e2(p) + (l-p)"e3(p). 



(21) 



where 



Where ei(p) = E^i ^.92^V) etc. 

We evaluate this expression numerically by solving recur- 
sion relation (fTTI) and compare it with simulations done at zero 
churn. As can be seen the prediction of the equation is very 
accurate (Figure |7ll. 

Let us now compute ei(p) to see what the leading order 
effect is. We now need to solve recursion relation ( fTTT i only 
to order 1 — p, which gives: 

C2 - 1 = (1 - p) 

C4-l = (1-p) [1 + P + p'] 

C8-l = (1-p) [l + p + p' + .-. + P*^] (22) 



Sl 


= C2 


= pCi 


+ (1- 


-p) + (l- 


p)Ci 


S2 


= c. 


= PC2 


+ (1- 


-P) + (1- 


p)Ci 


S3 


= Ci 


= pC3 


+ (1- 


-p) + (l- 


p)Ci 


S4 


= Cs 




-C7 + 


Cs 




S5 


= C9 


+ Cio 


f Cii 


+ C12 




S6 


= Cl3 


+ C14 


+ Cl5 + C'lQ 




Therefore 












Ao = Ci 












Ai = p[Ai4 


-Ci - 


04] + 


3(1- 


P) + 3(1- 


P) [Ao] 


A2 = p [A2 4 


-C4- 




- 12(1 


-P) + 3(1 


-P) [A 



(27) 



a-l = (l-p) [l + p + p2 + ... + p-2] 



(28) 
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In general for a base b, define B = b — 1 and = K,. Then 
we have: 

1 - P (29) 
+ \y-^ + B [Ao + Ai + ■ • • + Aj-_i] 

Following much the same procedure as before, we find 

M 



L 



3=0 



B 



B 



-M 



B 



B + ll 



Cfc — 1 Cf,2 — 1 



B + 1 + 



(30) 

for /C ^ oo as the analogue of (fT9] l. Again we can simplify 
and slightly overestimate the sum by assuming that « 
for X > ^ and ?» 1 for a; < Then we get; 



L « 1 



b-1 Ilia ^ 



5 ln2 6 

This is the analogue of Eq. |25] for any base b. 



(31) 



